The next table displays the first six rows of the niveau data set.
The data set collects the Aare’s daily maximum water levels in one unique station in Stilli, Untersiggenthal (canton of Aargau) and records the exact times at which daily maximal values are detected.
| Stationsname | Stationsnummer | Parameter | Zeitreihe | Parametereinheit | Gewässer | Zeitstempel | Zeitpunkt_des_Auftretens | Wert | Freigabestatus |
|---|---|---|---|---|---|---|---|---|---|
| Untersiggenthal, Stilli | 2205 | Pegel | Tagesmaxima | m ü.M. | Aare | 2000-01-01 00:00:00 | 2000-01-01 00:23:00 | 326.245 | Freigegeben, validierte Daten |
| Untersiggenthal, Stilli | 2205 | Pegel | Tagesmaxima | m ü.M. | Aare | 2000-01-02 00:00:00 | 2000-01-02 00:43:10 | 326.153 | Freigegeben, validierte Daten |
| Untersiggenthal, Stilli | 2205 | Pegel | Tagesmaxima | m ü.M. | Aare | 2000-01-03 00:00:00 | 2000-01-03 00:00:00 | 326.053 | Freigegeben, validierte Daten |
| Untersiggenthal, Stilli | 2205 | Pegel | Tagesmaxima | m ü.M. | Aare | 2000-01-04 00:00:00 | 2000-01-04 01:43:40 | 325.871 | Freigegeben, validierte Daten |
| Untersiggenthal, Stilli | 2205 | Pegel | Tagesmaxima | m ü.M. | Aare | 2000-01-05 00:00:00 | 2000-01-05 21:23:00 | 325.837 | Freigegeben, validierte Daten |
| Untersiggenthal, Stilli | 2205 | Pegel | Tagesmaxima | m ü.M. | Aare | 2000-01-06 00:00:00 | 2000-01-06 03:33:05 | 325.835 | Freigegeben, validierte Daten |
The below plot shows the evolution of water levels over 21 years from 01-01-2000 to 01-08-2021.
First, we observe a pattern indicating that water levels fluctuate a lot within a year (i.e. seasonality?). Second, we see that the lowest troughs are quite constant over the years (~325.5 meter above sea) whereas the highest peaks really distinguish themselves among the overall peaks. Indeed, we observe that the highest peaks were mainly observed in the Summer months with the highest peaks recorded on 25-08-2005 at 328.827 [meter above sea], on 09-06-2007 at 329.323 [meter above sea] and on 14-07-2021 at 328.622 [meter above sea]. Therefore, over 21 years three years have had extremely and unusually high water levels during the summer, respectively 2005, 2007 and 2021.
The following output shows the five-number statistics summary of the water level values. We immediately understand that the values are not symmetrical around the mean but much more tightly grouped on the lefts side of the mean (and the median) than on its right side, indicating that the water level values’s distribution is right-skewed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 325.4 325.6 325.8 325.9 326.1 329.3
The below histogram (i.e. frequency distribution) confirms the above observations. Indeed, we see that the data is right-skewed (i.e. non-normal) indicating that the data is disproportionally distributed on the right where water level outliers need to be investigated.
On top of the histogram we add a red curve being the smoothed version of the histogram showing again the shape of the distribution, a purple line indicating the median water level value (value that splits the observation in half) and the average water level in green.
Looking at the positions on the x-axis of the mean and the median [add mode at some point], we see that the mean seems to be a better indication of the center of the distribution. [adapt interpretation when mode is on].
[peaks-over-threshold method] The Peaks-over-Threshold method identifies extreme values that are above a designated threshold u. In order to determine an optimal threshold we will apply the MRL-plot and then look at the distribution of the data points. The value of u above which the plot is approximately linear can generally be selected as the optimal threshold. So,this is what we are going to do to model the high water levels: first we will make a MRL-plot to choose the optimal threshold and then use the Peak-over-Threshold method.
[clustering of the extremes] Clusters of the extremes correspond to the clustering of the data points that are above the chosen threshold u. Consecutive threshold exceedances are considered to belong to the same cluster. In our case, concerning the daily water levels data, by using the Peak-over-Threshold approach we can observe thanks to the plot the different clusters of the extreme values, then we can fit the Generalized Pareto Distribution (GPD) to the cluster maxima (after declustering if the exceedances exhibit autocorrrelation).
[drawbacks and advantages of using block maxima method instead ]
The threshold could be put at around 326 or 327.